Search Result

Select

Recommending clone refactoring method based on decision tree

SHE Rongrong, ZHANG Liping, HOU Min, YAN Sheng

Journal of Computer Applications 2018, 38 (7): 2037-2043. DOI: 10.11772/j.issn.1001-9081.2017122997

Abstract （404）

PDF （1208KB）（241）

Save

Aiming at long-term software maintenance even introduction of errors due to extensive use of cloned code, a classifier based on decision tree was proposed to recommend clone for refactoring. Firstly, clone detection was performed using NiCad. Secondly, the features related to cloning relationship, cloned code segment and clonal context were collected. Thirdly, a decision tree classifier was used for training. Finally, the classification results were evaluated by K-fold crossover. The experiments were conducted on nearly 600 clones in five kinds of open-source software. The experimental results show that the proposed method achieves 80% accuracy when recommending clonal refactoring instances for each target system.

Reference | Related Articles | Metrics

Select

Feature selection model for harmfulness prediction of clone code

WANG Huan, ZHANG Liping, YAN Sheng, LIU Dongsheng

Journal of Computer Applications 2017, 37 (4): 1135-1142. DOI: 10.11772/j.issn.1001-9081.2017.04.1135

Abstract （405）

PDF （1468KB）（410）

Save

To solve the problem of irrelevant and redundant features in harmfulness prediction of clone code, a combination model for harmfulness feature selection of code clone was proposed based on relevance and influence. Firstly, a preliminary sorting for the correlation of feature data was proceeded by the information gain ratio, then the features with high correlation was preserved and other irrelevant features were removed to reduce the search space of features. Next, the optimal feature subset was determined by using the wrapper sequential floating forward selection algorithm combined with six kinds of classifiers including Naive Bayes and so on. Finally, the different feature selection methods were analyzed, and feature data was analyzed, filtered and optimized by using the advantages of various methods in different selection critera. Experimental results show that the prediction accuracy is increased by15.2-34 percentage pointsafter feature selection; and compared with other feature selection methods, F1-measure of this method is increased by 1.1-10.1 percentage points, and AUC measure is increased by 0.7-22.1 percentage points. As a result, this method can greatly improve the accuracy of harmfulness prediction model.

Reference | Related Articles | Metrics

Select

Clone group mapping method based on improved vector space model

CHEN Zhuo, ZHANG Liping, WANG Huan, ZHANG Jiujie, WANG Chunhui

Journal of Computer Applications 2016, 36 (7): 2031-2037. DOI: 10.11772/j.issn.1001-9081.2016.07.2031

Abstract （342）

PDF （1026KB）（314）

Save

Focusing on the less quantity and low efficiency problem of Type-3 clone code mapping method, a mapping method based on improved Vector Space Model (VSM) was proposed. Improved VSM was introduced into the clone code analysis to get an effective clone group mapping method for Type-1, Type-2 and Type-3. Firstly, clone group document was pretreated to get the code document with removing useless word, and the file name, function name and other features of clone group document were extracted at the same time. Secondly, word frequency vector space of clone group was extracted and built; the similarity of clone group was calculated by using cosine algorithm. Then mapping of clone group was constructed by clone group similarity and feature matching, and the result of cloning group mapping was obtained finally. Five pieces of open source software was tested and verified by experiments. The proposed method can guarantee the recall and the precision of not less than 96.1% and 97.1% at low time consumption. The experimental results show that the proposed method is feasible, which provides data support for the analysis of software evolution.

Reference | Related Articles | Metrics

Select

Evolution pattern recognition and genealogy construction based on clone mapping of versions

ZHANG Jiujie, ZHAI Ye, WANG Chunhui, ZHANG Liping, LIU Dongsheng

Journal of Computer Applications 2016, 36 (7): 2021-2030. DOI: 10.11772/j.issn.1001-9081.2016.07.2021

Abstract （446）

PDF （1721KB）（352）

Save

To solve the problems that the method of building clone genealogy is complicated, as well as evolution patterns need urgently expanding, new clone evolution patterns were proposed, and clone genealogy was built automatically based on the mapping relationships of code clones between versions. First, topics of code clones were extracted using Latent Dirichlet Allocation (LDA) from clone detection results in each released software version. Second, mapping relationships of code clones between of versions were confirmed by similarities of the topics. Third, evolution patterns were appended to code clones according to the existing mapping relationships, and evolution features were analyzed. Finally, clone genealogy was built by integrating mapping relationships and evolution patterns together. Experiments of building clone genealogy was conducted on four open source systems. The experimental results show that the proposed approach is feasible, and the proposed evolution patterns really exist in the procedure of software evolution. Further more, it is found that about 90% of code clones in the software systems are stable during evolution, and approximately 67% of clone groups live through less than half of the release versions. The experimental conclusions and relevant analysis provide strongly support for the future research as well as maintenance and management of code clones.

Reference | Related Articles | Metrics

Select

Info-association topology based social relationship mining on Internet

LIU Jinwen, XING Kai, RUI Weikang, ZHANG Liping, ZHOU Hui

Journal of Computer Applications 2016, 36 (7): 1875-1880. DOI: 10.11772/j.issn.1001-9081.2016.07.1875

Abstract （540）

PDF （1000KB）（419）

Save

To solve the problems of needing labeling a great number of training data and pre-defining relation types in relation extraction methods based on supervised learning, a method for personal relation extraction by constructing the correlation network based on word co-occurrence information and performing graph clustering analysis on the correlation network was proposed. Firstly, 500 highly related person pairs for the research of relation extraction were gotten from the news title data. Secondly, the news data which contained related person pairs were crawled and performed pre-processing, and the keywords in the sentences which contained person pairs were gotten by the Term Frequency-Inverse Document Frequency (TF-IDF). Thirdly, the correlation between the words was acquired by the words co-occurrence information, and the key-words correlation network was constructed. Finally, the personal relations were acquired by the graph clustering analysis on the correlation network. In the relation extraction experiments, compared with the traditional algorithm of Chinese relation extraction based on word co-occurrence and pattern matching technology, the precision, recall and F-score of the proposed method were improved by 5.5, 3.7 and 4.4 percentage points respectively. The experimental results show that the proposed algorithm can effectively extract abundant and high-quality personal relation data from news data without labeling training data.

Reference | Related Articles | Metrics

Select

Solution for classification imbalance in harmfulness prediction of clone code

WANG Huan, ZHANG Liping, YAN Sheng

Journal of Computer Applications 2016, 36 (12): 3468-3475. DOI: 10.11772/j.issn.1001-9081.2016.12.3468

Abstract （512）

PDF （1160KB）（328）

Save

Focusing on the problem of imbalanced classification of harmful data and harmless data in the prediction of the harmful effects of clone code, a K-Balance algorithm based on Random Under-Sampling (RUS) was proposed, which could adjust the classification imbalance automatically. Firstly, a sample data set was constructed by extracting static features and evolution features of clone code. Then, a new data set of imbalanced classification with different proportion was selected. Next, the harmful prediction was carried out to the new selected data set. Finally, the most suitable percentage value of classification imbalance was chosen automatically by observing the different performance of the classifier. The performance of the harmfulness prediction model of clone code was evaluated with seven different types of open-source software systems containing 170 versions written in C language. Compared with the other classification imbalance solution methods, the experimental results show that the proposed method is increased by 2.62 percentage points to 36.7 percentage points in the classification prediction effects (Area Under ROC(Receive Operating Characteristic) Curve (AUC)) of harmful and harmless clones. The proposed method can improve the classification imbalance prediction effectively.

Reference | Related Articles | Metrics

Select

Clone genealogy extraction method based on software code evolution information

CHEN Zhuo, ZHANG Liping, WANG Chunhui

Journal of Computer Applications 2016, 36 (12): 3461-3467. DOI: 10.11772/j.issn.1001-9081.2016.12.3461

Abstract （738）

PDF （1115KB）（385）

Save

The current clone evolution pattern classification is not clear, and clone genealogy extraction tool has less quantity and low efficiency. In order to solve the problems, a clone genealogy extraction method was proposed according to the code clone mapping relationships and evolution information. Firstly, clone group and clone fragment were mapped by word frequency vector calculation, code line distance and clone attribute from different stages. And then the evolution pattern was appended to clone group and clone fragment according to the mapping results. Finally, clone genealogy was constructed by combining clone mapping relationships and evolution pattern in all versions. Four open source softwares were tested and artificially verified in experiments. The experimental results show that the clone genealogy extraction tool-Extract Clone Genealogy (ECG) is valid and efficient. In addition, it is found that about 42% of clone codes have not changed in the evolution process from the extraction results, and about 3.48% of clone codes have inconsistent change, such clones may introduce potential bugs which need to be focused on. The proposed method will provide reference and data support for code clone quality assessment and management.

Reference | Related Articles | Metrics

Select

Harmfulness prediction of clone code based on Bayesian network

ZHANG Liping, ZHANG Ruixia, WANG Huan, YAN Sheng

Journal of Computer Applications 2016, 36 (1): 260-265. DOI: 10.11772/j.issn.1001-9081.2016.01.0260

Abstract （467）

PDF （875KB）（412）

Save

During the process of software development, activities of programmers including copy and paste result in a lot of code clones. However, the inconsistent code changes are always harmful to the programs. To solve this problem, and find harmful code clones in programs effectively, a method was proposed to predict harmful code clones by using Bayesian network. First, referring to correlation research on software defects prediction and clone evolution, two software metrics including static metrics and evolution metrics were proposed to characterize the features of clone codes. Then the prediction model was constructed by using core algorithm of Bayesian network. Finally, the probability of harmful code clones occurrence was predicted. Five different types of open-source software system containing 99 versions written in C languages were tested to evaluate the prediction model. The experimental results show that the proposed method can predict harmfulness for clones with better applicability and higher accuracy, and further reduce the threat of harmful code clones while improving software quality.

Reference | Related Articles | Metrics

Select

Clone genealogies extraction based on software evolution over multiple versions

TU Ying, ZHANG Liping, WANG Chunhui, HOU Min, LIU Dongsheng

Journal of Computer Applications 2015, 35 (4): 1169-1173. DOI: 10.11772/j.issn.1001-9081.2015.04.1169

Abstract （974）

PDF （985KB）（624）

Save

Since clone detection results cannot fully reflect the features of clones, clone genealogies extraction from multiple versions can be used to uncover the patterns and characteristics exhibited by clones in the evolving system. A clone genealogy extraction method named FCG was proposed. FCG first mapped clones between each adjacent versions and then identified clone evolution patterns. All of the results were combined to get clone genealogies. Experiments on 6 open source systems found that the average lifetime of clones in current version is over 70 percent of the total number of studied versions, and most of them do not change, which indicates that majority of clones can be well maintained. While some unstable clones may be defect potential, and needs to be modified or refactoring. Results show that FCG can efficiently extract clone genealogies, which contributes to a better understanding of clones and provides insights on targeted management of clones.

Reference | Related Articles | Metrics

Select

Clone code detection based on Levenshtein distance of token

ZHANG Jiujie, WANG Chunhui, ZHANG Liping, HOU Min, LIU Dongsheng

Journal of Computer Applications 2015, 35 (12): 3536-3543. DOI: 10.11772/j.issn.1001-9081.2015.12.3536

Abstract （1271）

PDF （1361KB）（465）

Save

Aiming at the problems of less clone code detection tools and low efficiency for the current Type-3, an effective clone code detection method for Type-3 based on the levenshtein distance of token was proposed. Type-1, Type-2 and Type-3 clone codes could be detected by the proposed method in an efficient way. Firstly, the source codes of a subject system were tokenized into some token sequences with specified code size. Secondly, each definite-sized substring of the token sequences was mapped with corresponding index. Thirdly, the clone pairs were built by the levenshtein distance algorithm and the clone groups were built by the disjoint-set algorithm on the basis of the mapping information query. Finally, the feedback information of clone codes were given. A prototype tool named FClones was implemented. It was evaluated by the code mutation-based framework and compared with two state-of-the-art tools SimCad and NiCad. The experimental results show that the recall of FCloens is equal to or greater than 95% and its precision is not lower than 98% in detecting all of these three types of clone codes. FClones can do better in detecting Type-3 clones than others.

Reference | Related Articles | Metrics

Select

Construction of rectangle trapezoid circle tree and indeterminate near neighbor relations query

LI Song, LI Lin, WANG Miao, CUI Huanyu, ZHANG Liping

Journal of Computer Applications 2015, 35 (1): 115-120. DOI: 10.11772/j.issn.1001-9081.2015.01.0115

Abstract （519）

PDF （977KB）（379）

Save

The spatial index structure and the query technology plays an important role in the spatial database. According to the disadvantages in the approximation and organization of the complex spatial objects of the existing methods, a new index structure based on Minimum Bounding Rectangle (MBR), trapezoid and circle (RTC (Rectangle Trapezoid Circle) tree) was proposed. To deal with the Nearest Neighbor (NN) query of the complex spatial data objects effectively, the NN query based on RTC (NNRTC) algorithm was given. The NNRTC algorithm could reduce the nodes traversal and the distance calculation by using the pruning rules. According to the influence of the barriers on the spatial data set, the barrier-NN query based on RTC tree (BNNRTC) algorithm was proposed. The BNNRTC algorithm first queried in an idea space and then judged the query result. To deal with the dynamic simple continuous NN chain query, the Simple Continues NN chain query based on RTC tree (SCNNCRTC) algorithm was given. The experimental results show that the proposed methods can improve the efficiency of 60%-80% in dealing with large complex spatial object data set with respect to the query method based on R tree.

Reference | Related Articles | Metrics

Select

Predicting inconsistent change probability of code clone based on latent Dirichlet allocation model

YI Lili ZHANG Liping WANG Chunhui TU Ying LIU Dongsheng

Journal of Computer Applications 2014, 34 (6): 1788-1791. DOI: 10.11772/j.issn.1001-9081.2014.06.1788

Abstract （171）

PDF （748KB）（401）

Save

The activities of the programmers including copy, paste and modify result in a lot of code clone in the software systems. However, the inconsistent change of code clone is the main reason that causes program error and increases maintenance costs in the evolutionary process of the software version. To solve this problem, a new research method was proposed. The mapping relationship between the clone groups was built at first. Then the theme of lineal cloning cluster was extracted using Latent Dirichlet Allocation (LDA) model. Finally, the inconsistent change probability of code clone was predicted. A software which contains eight versions was tested and an obvious discrimination was got. The experimental results show that the method can effectively predict the probability of inconsistent change and be used for evaluating quality and credibility of software.

Reference | Related Articles | Metrics

Select

Simple continuous near neighbor chain query in constrained regions

ZHANG Liping Lisong ZHAO Jiqiao HAO Xiaohong

Journal of Computer Applications 2014, 34 (2): 406-410.

Abstract （434）

PDF （800KB）（434）

Save

The exiting methods of the nearest neighbor query can not search the simple continuous near neighbor chain in the constrained regions. To remedy the deficiency of the existing work, according to the complexity of the constrained regions and the obstacles, the simple continuous near neighbor chain query with non obstacles and with obstacles were studied respectively. The VOR_NB_CRSCNNC algorithm and the VOR_CB_CRSCNNC algorithm were presented. The spatial data were filtered and computed based on the Voronoi diagram and the judging circles. The calculations of each query were reduced by only considering the points which lay in the Voronoi polygon and the juding circles. The theatrical study and the experimental results show that the redundant calculation is reduced and the query efficiency is less affected by the constrained regions.